This Page Is Inserted by IFW operations 
and is not apart of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WfflTE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 

IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents m// correct images, 
please do not report the images to the 
Image Problem Mailbox. 



W'S PAGE BUNK (WO, 



PCT 



WORLD [NTELLECrUAL PROPERTY ORGANIZATION 
L'i:rnr.adonal Bureau 




INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ 

C12N 15/10, C12Q ^68 // C12N 9/00 



Al 



"(11) InternationarPiibUcation Number": WO 98/41622 

(43) International Publication Date: 24 Scpcembcr 1993 (24X^.9S) ; 



(21) International Application Number: PCT/DK9S/001O3 

(22) International FUing Date: 18 Maxh 199S (tS.03.98) 



(30) Priority Data: 

0306/97 
0433/97 
0623/97 



18 March 1997 (18.03.97) DK 
17 April 1997 (17.04.97) DK 
30 May 1997 (30.05.97) DK 



(71) Applicant: NOVO NORDISK AJS [DK/DK]; Novo Aili. 

DK-2SS0 Bagsvxrd (DK). 

(72) Inventors: BORCHERT, Torbcn. Vcdcl; Novo Nordisk z's. 

Novo Allti, DK-2SS0 Bagsvxrd (DK). KKEraCHM.AK. 
Tirjs: KaspcrimUhlstnssc 6. D-81739 Munich (DE). 
CHHRRY, Joel, R.; 916 Anderson Road. Davis. CA 95616 
(US). 



(81) Designated States: AL. AM, AT. AU, A2, BA, BB. BG, BR, 
BY. OA. CH, CN, CU. 02. DE. DK. EE, ES. FI. 03. GE. 
OH, CM. GW, HU. ID, IL. IS. JP, KE. KG, KP. KR, K2, 
LC, LK. LR, LS. LT. LU. LV. MD. MG. MK, MN. NrVr'. 
NDC. NO, NZ. PL, PT. RO. RU. SD. SE. SG. SI, SK, SU TJ. 
TM. TR. TT. UA. UG. UZ, VN. YU. ZW, ARIPO ra-nt 
(GH. GM. KE. LS. NfW. SD, SZ. UG. ZV^XE^^nsiiz ^'^ct 
(AM. AZ. BY. KG. KZ, MD. RU. TJ, TM). Eurcpcia ra-nt 
(AT, BE. CH. DE. DK. ES, FI. FR. GB. GR. IE. if, LU. 
MC, NL. PT. SE). OAPI patent (BF, BJ. CF, CG. CI CM. 
GA, CN. ML. MR. NE, SN, TD. TG). 



Published 

With internctiorMl search report. 



(54) Title: MnTtlOD FOR CONSTRUCTING A LIBRARY USING DNA SHUTTLING 
(57) Abstract 

' A r.^r.cd for the ccnstraction of a library of rtccrr.bir.ed p D!yr.'j:ico:id:3 ::zz\ a r.umb:: of dif::rcn: Stirling single or do-bic snr.d;J 

p3n:n:a] DNA templates is disclosed, wherein Lhc s::ir.;r.g sir.nle cr do'jble s:nr.ded pa.tntal DNA te"pla:cs reprr^en: disc:e:e pcr.c; in a 

■ :opuiat:cn of ger.es c.':cc<iing cvoluticnary or synL-e:i: hom?:Dg'je3 of a pep'Jde having horr.ologies r:Lnging over a broad specr^^n frcn^. 

\ \izz Lhar. 15 T^^'to more Lhxn SO 9c. said pcpubiicn cxh:bi:ir.g a: leas: cr.e ider.'J::ca:icn scq-jer.ce. and whereby said gene^ are s'jbjcr^d :o 

' \ \zT.z sh'j::-i.-g procedure to generate sh'jfHed m-'^ni: c: saii pcp'Jiaiicn of genes represen'Jng addiiiona! discrete points b<rAeen L-^.cse 
:f s i: J Stirling terr.piates. 



i 



I 

i 



FOR THE PURPOSES OF IS FORM AT LOS ONLY 



CfXifS U3;d to identify States party to the PCT on the frcn: pag::: of pairphlcts publishing in:;nuuonaI appLxatioos under ±c PCT. 



AL 


Albi,n;i 


E5 


Spiin 


L5 


Loot.So 


SI 


SVjvcnia 


AM 




Ft 


Finlind 


LT 


LithuinLi 


SK 


Scvikia 


AT 


Auirii 


FR 


Fraxio: 


LU 


LiucmbcKirj; 


SN 


ScKsal 


AU 


AiaL-iIij 


CA 


Gabon 


LV 


LAT/ia 


S2 




A2 




CB 


Uni:cd Kirgdcn 


MC 




TD 




BA 




GE 


Georgia 


MD 


Republic of Molidovi 


TO 


Tojo 


BB 




CH 


ChiAi 


MC 


Ntidigiicar 


TJ 


Tapliitin 


BE 


Belgium 


CN 


Guinea 


MK 


The fonr-iCf YujciUv 


TM 


TtsrfcncnLstin 


BF 


Buftini FiiO 


OR 


Greece 




Republic of Micexioou 


TR 


Tartiy 


BG 


Buigirij 


HU 


Hungijy 


ML 


MiJi . 


TT 


Tcnidixi and Tobsgo 


BJ 


Benin 


IE 


Ire!aj:d 


M.N 


Mocgolia 


UA 


Ltiinc 


BR 


Bniil 


IL 




MR 


Miuhtxiia 


UC 




BY 


Bc!i.''j 


IS 


IceUr.d 


M\V 




L'S 


Uaiuul Siucj of Amcjca 


CA 




IT 


ItiJy 


MX 




DZ 


UzbekiiLm 


CF 


CcRCiJ Africw Republic 


JP 


Jipin 


NE 


Nijcr 


VN 


V«T Nam 


CC 


Conjo 


KE 


Kenyi 


NL 


NcL^rlajiii 


YU 


Y^ioiUvia 


CM 




KG 




NO 


Ncr*ay 


2W 




CI 




K? 


Dcmocruic Peoplc'i 


NZ 


New ZcaJard 






CM 


CiT. croon 




Republic of Korei 


PL 


Poll-Ki 






CN 


Chiiu 


KR 


Republic of Korea 


PT 


Pcrrjgit 






cu 


Cubi 


K2 




RO 


Romiaia 






cz 


Cifch R:pub!ic 


LC 


Saint Lucia 


RU 


RiiijuLi Ftdo^ion 






DE 




LI 


Licchlerutein 


SD 


Stidaa 






DK 




LK 


Sri LanJu 


SE 


Sweden 






EE 




LR 


Libcrli 


sc 


Sin|jporc 







wo 98/41622 PCr/DK98/00103 



Title : METHOD FOR CONSTRUCTING A LIBRARY USING DNA SHUFFLING 



FIELD OF THE INVENTION 

The present invention relates to optimizing DNA sequence 
in order to (a) improve the properties of a protein of interes 
by artificial generation of genetic diversity of genes encodin 
proteins having a biological activity of interest by the use c 
the so-called gene- or DNA shuffling technique to create 
large library of "genes", expressing said library of genes in 
suitable expression system and screening the expressed proteir^ 
in respect of specific characteristics to determine such pre 
teins exhibiting desired properties or (b) improve the proper 
ties of regulatory elements such as promoters or terminators b 
generation of a library of these elements, transforming suit 
able hosts therewith in operable conjunction with a structure 
gene, expressing said structural gene and screening for desir 
able properties in the regulatory element. 



BACKGROUND OF THE INV^NTIO;: 

It is generally found that a protein performing a certai 
bioactivity exhibits a certain variation between genera ar 
even between m.errbers of the same species differences m.ay exist 
This variation is of course even more outspoken at the geno.Tl 
level . 

This natural genetic diversity among genes coding fc 
proteins having basically the same bioactivity has been gene 
ated in Nature over billions of years and reflects a natur 
optimization of the proteins coded for in respect of the env 
ronm.ent of the organisrr. in question. 
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In today's society the conditions of life are vastly re- 
moved from the natural environment and it has been found tha* 
the naturally occurring bioactive molecules are not optimized 
for the various uses to which they are put by mankind, espe- 
5 cially when they are used for industrial purposes. 

It has therefore been of interest to industry to identify 
such bioactive proteins that exhibit optimal properties in re- 
spect of the use to which it is intended. 

This has for many years been done by screening of natural 

10 sources, or by use of mutagenesis. For instance, within the 
technical field of enzymes for- use in e.g. detergents, the 
washing and/or dishwashing performance of e.g. naturally occur- 
ring proteases, lipases, amylases and cellulases have been im- 
proved significantly, by in vitro modifications of the enzymes. 

15 In most cases these improvements have been obtained by 

site-directed mutagenesis resulting in substitution, deletion 
or insertion of specific amino acid residues which have been 
chosen either on the basis of their type or on the basis of 
their location in the secondary or tertiary structure of the 

20 nature enzyme (see for instance US patent no. 4,518,534) . 

In this manner the preparation of novel polypeptide vari- 
ants and mutants, such as novel modified enzymes with altered 
characteristics, e.g. specific activity, substrate specificity, 
thermal, pH and salt stability, pH-optimum, pi, K^^, V^^^^ etc.,' 

25 has successfully been performed to obtain polypeptides with im- 
proved properties. 

For instance, within the technical field of enzymes the 
washing and/or dishwashing performance of e.g. proteases, li- 
pases, amylases and cellulases .have been improved signifi- 

30 cantly. 
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An alternative general approach for modifying proteins 
and enzymes has been based on random mutagenesis, for instance, 
as disclosed in US 4,894,331 and WO 93/01285 

As it is a cumbersome and time consuming process to ob- 
5 tain polypeptide variants or mutants with improved functional 
properties a few alternative methods for rapid preparation of 
modified polypeptides have been suggested. 

Weber et al., (1983), Nucleic Acids Research, vol. 11, 
5661-5661, describes a method for modifying genes by in vivo 
10 recombination between two homologous genes. A linear DNA se- 
quence comprising a plasmid vec-tor flanked by a DNA sequence 
encoding alpha-1 human interferon in the 5 ' -end and a DNA se- 
quence encoding aipha-2 human interferon in the 3 '-end is con- 
structed and t'ransfected into a rec A positive strain of Z. 
15 coll. Recombinants were identified and isolated using a resis- 
tance marker. 

Pompon et ai., ( 1989), Gene 83, p. 15-24, describes a 
method for shuffling gene domains of mamjnalian cytochrome P-4 5: 
by in vivo recorrJDinat ion of partially homologous sequences in 
20 Saccharomyces cerevisiae by transfonrdng Saccharomyces cere- 
visiae with a linearized plas-ic with, filied-in ends, and a DSA 
frac.T.ent being partially hor.ologous to the ends of said plas- 
mid . 

In VJO 97/07205 a method is described whereby polypeptide 
25 variants are prepared by shuffling different nucleotide se- 
quences of homologous DNA sequences by in vivo recombination 
using plasmid DNA as template. 

US patent no. 5,093,257 (Assignee: Genencor Int. Inc.) 
discloses a method for pro::::rin3 hybrid polypeptides by in vivo 
30 recorrjDination. Hybrid DNA ser.uences are produced by forming a 
circular vector comprising a replication sequence, a first CSA 
sequence encoding the amine- termi nal portion of the hybrid pc- 
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lypeptide, a second DNA sequence encoding the carboxy-tenninsl 
portion of said hybrid polypeptide. The circular vector is 
transformed into a rec positive microorganism in which the cir- 
cular vector is amplified. This results in recombination cf 
said circular vector mediated by the naturally occurring recom- 
bination mechanism of the rec positive microorganism, which ir.- 
clude prokaryotes such as Bacillus and E. coli, and eukaryotes 
such as Saccharomyces cerevisiae. 

One method for the shuffling of homologous DNA sequences 
has been described by Stemmer (Stemmer, (1994), Proc. Natl. 
Acad. Sci. USA, Vol. 91, 10747-10751; Stemmer, (1994), Nature, 
vol. 370, 389- 391)- The method concerns shuffling homologo'.:s 
DNA sequences by using in vitro PGR techniques. Positive reccr.- 
binant genes containing shuffled DNA sequences are selected 
from a DNA library based or. the improved function of the ex- 
pressed oroteins. 

The above method is also described in WO 95/22625. KO 
95/22625 relates to a method for shuffling of homologous DSA 
sequences. An important step m the method described in WO 
95/22 625 is to cleave the homologous template double-stranded 
oolynucleotide into random fracm.ents of a desired size followed 
by hom.ologously reassemi,ling of the fragm.ents. into full-length 
genes. 

A disadvantage inherent to the method of WO 95/22625 is, 
however, that the diversity generated through that method is 
limited due to the use of homologous gene sequences (as defined 

in WO 95/22625) . 

Another disadvantage in the method of WO 95/22625 lies ir 
the production of the random fragm.ents by the cleavage of th= 
template double-stranded polynucleotide. 

• A further reference of interest is WO 95/17413 describir. 
a method of gene or DNA shuffling by recombination of specifi 
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DNA sequences - so-called design elements (DE) - either by re- 
combination of synthesized double-stranded fragments or recom- 
bination of PGR generated sequences to produce so-called func- 
tional elements (FE) comprising at least two of the design 
5 elements. According to the method described in WO 95/17413 the 
recombination has to be performed among design elements thar 
have DNA sequences with sufficient sequence homology to enable 
hybridization of the different sequences, to be recombined. 

WO 95/17413 therefore also entails the disadvantage tha- 
10 the diversity generated is relatively limited. Furthermore the 
methods described are time consuming, expensive, and not suited 
for automation. 

Despite the 'existence of the above m.ethods there is still 
a need for better iterative in vivo recombination methods for 
15 preparing novel polypeptide variants. Such m.ethods should alsc 
be capable of being performed in small volumes, and amenable to 
automation . 

Furthermore, there also is a need for methods providing 
Che possibility of being able to shuffle genes with relatively 
20 low hor?.oiogy. 

su:-:mary of the iriVENTio:: 

The present invention relates to' a m.ethod for the con- 
struction of a library of reconrijined polynucleotides from a 

25 n'a.Tj3er of different starting single or double stranded parental 
DNA templates, wherein said starting single or double stranded 
parental DMA templates represent discrete points in a popula- 
tion of genes encoding evolutionary or synthetic homologues cf 
a peptide having hon^.ologies ranging over a broad spectrum frc.-a 

30 less than 15% to more than 60:-, said population exhibiting at 
least one identification sequence, and whereby said genes are 
subjected to a gene shuffling procedure to generate shuffled 
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mutants' of said population of genes representing additional 
discrete points between those of said starting templates. 

The gene shuffling procedure to be used according to the 
invention can be any suitable method such as those described 
above or a procedure as described in our co-pending patent ap- 
plication filed on the same date, and outlined below. 

According to that procedure template shifts of newly syn- 
thesized DNA strands during in vitro DNA synthesis are utilized 
to achieve DNA shuffling. 

In a further aspect the invention relates to a method of 
identifying polypeptides exhibiting improved properties in com- 
parison to naturally occurring .polypeptides of the same bioac- 
tivity, whereby a' library of recombined polynucleotides pro- 
duced by the above process are cloned into an appropriate vec- 
tor, said vector is then transformed into a suitable host sys- 
tem, to be expressed into the corresponding polypeptides, said 
polypeptides are then screened in a suitable assay, and posi- 
tive results selected. 

In a still further aspect zhe invention relates to a 
method for producing a polypeptide of interest as identified in 
the preceding process, whereby a vector comprising a polynu- 
cleotide encoding said polypeptide is transformed into a suit- 
able host, said host is grown to express said polypeptide, and 
the polypeptide recovered and purified. 

DEFINITIONS 

Prior to discussing this invention in further detail, the 
following terms will first be defined. 

''Shuf fling" : The term "shuffling" herein means - recombina- 
tion* of nucleotide sequence fragT;ent(s) between two or mor 
polynucleotides resulting in output polynucleotides (i.e 
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polynucleotides having been subjected to a shuffling cycle) 
having a number of nucleotide fragments exchanged, in compari- 
son to the input polynucleotides (i.e. starting point polynu- 
cleotides) . 

"Homology of DNA sequences or polynucleotides" In the 
present context the degree of DNA sequence homology is deter- 
mined as the degree of identity between two sequences indica.- 
ing a derivation of the first sequence from the secor.d. The ho- 
mology may suitably be determined by means of computer prograr.s 
known in the art, such as GAP provided in the GCG program pac>- 
age (Program Manual for the Wisconsin Package, Version E, 
August 1994, Genetics Computer Group, 5,75 Science Drive, Madi- 
son, Wisconsin, USA 53711 ) (Needleman, S.B. and Wunsch, CD., 
(1970), Journal of Molecular Biology, 48, 443-453). 

"Homologous": The terra "homologous" means that one sir.- 
gle-stranded nucleic acid sequence may hybridize to a comple- 
mentary single-stranded nucleic acid sequence. The degree cf 
hybridization may depend on a nuir^er of factors including the 
amount of identity between the sequences and the hybridization 
conditions such as ter-.pera-ure and salt concentration as dis- 
cussed later (vide infra). 

Using the coxpucer program GA? {vide supra) with the fol- 
lowing settings for DNA sequence comparison: G.AP creation pen- 
alty of 5.0 and GA? extension penalty of 0.3, it is in the pre- 
sent context believed that two DNA sequences will be able to 
hybridize (using low stringency hybridization conditions as de- 
fined below) if they mutually exhibit a degree of identity 
preferably of at least 70%, r.ore preferably at least 80%, and 
even more preferably at least 85%. 

"heterologous": If two or more DNA sequences mutually ex- 
hibit a degree of identity which is less than above specified, 
they are in the present context said to be "heterologous". 
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''Hybridization:" Suitable experimental conditions for 
determining if two or more DNA sequences of interest c: 
hybridize or not is herein defined as hybridization at lev 
stringency as described in detail below, 
5 Molecules to which the oligonucleotide probe hybridizes 

under these conditions are detected using a x-ray film or a 
phosphoimager . 

"primer'': The term "primer" used herein especially i- 
connection with a ' PGR reaction is an oligonucleotide 

10 (especially a "PCR-primer") defined and constructed accordir.c 
to general standard specification known in the am ("PGR A 
practical approach" IRL Press, .(1991)). 

"A primer directed to a sequence:" The term "a primer di- 
rected to a sequence" means that the primer (preferably to ce 

15 used in a PGR reaction) is constructed to exhibit at least Sih 
degree of sequence identity tc the sequence part of interest, 
more preferably at least 90^ cegree of sequence identity to the 
sequence part of interest, which said primer consequently is 
"directed to". The primer is designed in order to specif icallv 

20 anneal at the region at a given te.T;perature it is directed to- 
wards . Especially identity at the 3' end of the primer is es- 
sential for the function of the polymerase, i.e. the ability of 
a polymerase to extend the annealed primer. 

''flanking" The term "flanking" used herein in connection 

25 with DMA sequences comprised in a PCR-fragment means the outer- 
m.ost partial sequences of the PCR-fragment, both in the 5' and 
3' ends of the PGR fragment. 

"Polypeptide" Polyn^iers of amino acids sometimes referred 
to as protein. The sequeni-: of amino acids determines the 

30 folded conf orm.ation that th-:- polypeptide assumes, and this in 
turn determines biological properties such as activity. Seme 
polypeptides consist of a single polypeptide chain 
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(monomeric) , whilst other comprise several associated polypep 
tides (multimeric) . All enzymes and antibodies are polypep 
tides . 

'^Enzyme" A protein capable of catalysing chemical reac 
tions. Specific types of enzymes are a) hydrolases includin 
amylases, cellulases and other carbohydrases, proteases, an- 
lipases, b) oxidoreductases, c) Ligases, d) Lyases, e 
Isomerases, f) Transferases, etc. Of specific interest in re 
lation to the present invention are enzymes used in deter- 
gents, such as proteases, lipases, cellulases, amylases, etc. 



DETAILED DESCRIPTION OF THE INVENTION 

All possible genes encoding a polypeptide of the sar.e 
evolutionary origin can be seen as a very large population of 
DNA sequences (e.g. {Gsp I the set of genes encoding a serine 
protease}). It has been found that the homology between the 
polypeptides encoded by single rr.errJoers of such a population mai. 
be even as small as less than 15% (the genes originating frc" 
"distant" organisms) . 

When searching for polypeptides suited for the various 
purposes that mankind has developed, it has been found diffi- 
cult, if not impossible at our present level of knowledge, tc 
conclude in a rational manner on the optimal configuration c 
the polypeptide in question. Therefore it was found desirabl' 
to provide a simple method of generating a sub-population c 
the above mentioned very large population, but representing 
substantial part of the variation possible within the larc 
population . 

The object of the preser.t invention is thus to provide 
method whereby it is possible to shuffle components of gene 
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encoding polypeptides of the same functionality, but having 
only low homologies. 

To this end it is necessary to obtain a reasonable knowl- 
edge of the population in question, meaning having at disposal 
a nuinber of individual members ( e.g. 5, 10, 15 or more mem- 
bers) representing as high a variation as possible. This small 
sub-population is then used as a starting point for generating 
a much larger sub-population of genes. The corresponding 
polypeptides of the large sub-population obtained are then dis- 
played and screened in an appropriate manner to identify sucr. 
members of the large sub-population that are optimal for the 

intended purpose. 

It was found that the expansion of the starting sub- 
population to the large sub-population could be accomplished 
using gene shuffling methods. 

such methods as described in the literature provide means 
to exchange DNA fragments between genes coding for polypeptides 
of a reasonably high homology, typically to be above 80%, re- 
sulting in the generation of novel genes encoding polypeptides 
having homologies between 80% and 99%. 

\t was also found that in the method of the invention it 
vas necessary as starting population to use genes encoding 
polypeptides that are at least from 70%- to 80% homologous to az 
least one other gene in the starting population. 

According to the invention it is thus im.portant to star", 
from a population or sub-set of genes which comprises interm.e- 
diate sequences ranging from genes being rather similar to 
genes being rather dissimilar, but still having the same evolu- 
tionary origin (function). Only then a shuffling of even rather 
heterologous sequences is feasible. The stepwise shuffling of, 
first, quite homologous genes creates new species which are no- 
contained in the starting population, and which, in the subse 



wo 98/41622 PCT/DK98/00103 

11 

auent shuffling rounds, will recombine with each other and wiih 
other more heterologous genes from the starting population, ar.d 
so on. 

Finally, hybrids are generated in which sequence parts 
5 from very heterologous starting genes can be found. These re- 
trieved starting genes would have never been shuffled without 
having the intermediate species in the starting population be- 
cause of a too large "sequence space*' distance. 

Having this condition fulfilled it was found that it was 

10 possible to generate genes encoding novel functional polypep- 
tides having a homology as low as the minimum degree of homol- 
ogy represented in the starting population. In principle the 
homology range in the final population may be even greater thar. 
that for the starting population. 

15 The present invention relates in its first aspect to a 

method for the construction of a library of recombined polynu- 
cleotides from a number of different starting single or double 
stranded parental DNA templates, wherein said starting single 
or double stranded parental DNA templates represent discrete 

20 points in a population of genes encoding evolutionary or syr:- 
thetic ho.n:\oloGues of a peptide having homologies ranging over a 
broad spectrum from less than 15% to more than 80%, said popu- 
lation exhibiting at least one identification sequence, and 
whereby said genes are subjected to a gene shuffling procedure 

25 to generate shuffled mutants of said population of genes repre- 
senting additional discrete points between those of said start- 
ing templates. 

According to the invention it is possible to use parental 
DNA templates representing horologies ranging from less than 
30 45^5, 40%, 35%, 30%, 25%, 20s, or 15% to more than 80%, 85*^, 
90%", 95%, or 99%. 
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In specific embodiments at least one identification se 
quence is identified and primers constructed to anneal thereto 
These sequences can be located anywhere on the genes. 

In a preferred embodiment at leasts two identif icatio 
sequences are identified. These sequences can be located at an^ 
distance from each other, but it is preferred that they are lo- 
cated as far as possible from each other on the genes. 

According to these embodiments said identification se- 
quences may correspond to an amino acid sequence of from 4 to E 
amino acid residues, which sequence is highly conserved amonc 
the peptides encoded by the col-lection of starting single or 
double stranded parental DNA templates, ^preferably from 5 to 7 
amino acid .residues. 

It is preferred that the identification sequences are .lo- 
cated a distance apart corresponding to the average size of the 
genes in said collection with a variation of up to 40%. The 
longer apart the sequences are the larger a part of the gene is 
shuffled. 

However, situations may arise, where it is desired onlv 
to shuffle the sequences between identification sequences lo- 
cated quite close to each other. 

As indicated above the gene shuffling m.ethod used in the 
method of the invention is of less or no significance. In prin- 
ciple any method will work. 

Thus the methods disclosed in WO 95/22625 and WO 95/17413 
are fully operable in the present invention. Details showing 
how these methods may be used for practising the present inven- 
tion are indicated in the Examples below. 

Therefore further gene shuffling methods described in co- 
fiied patent applications are also contemplated for use in the 
oresent method. 
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According to one of these procedures template shifts of 
newly synthesized DNA strands during in vitro DNA synthesis are 
utilized to achieve DNA shuffling. 

More specifically that method provides for the construc- 
5 tion of a library of recombined homologous polynucleotides frc- 
a number of different starting single or double stranded paren- 
tal DNA templates and primers by induced template shifts during 
an in vitro polynucleotide synthesis using a polymerase, 
whereby 

10 A. extended primers or polynucleotides are synthesized by 

a) denaturing parental double stranded DNA tenplates 
produce single stranded templates, 

b) annealing said primers to the single stranded DNA ter- 
plateS/ 

15 c) extending said prirr.ers by initiating synthesis by use 

of said polymerase; 

d) cause arrest of the synthesis, and 

e) denaturing the double strand to separate the extended 
primers from the ter.piates, 

20 B. a te:?.plate shift is induced by 

a) isolating the newly synthesized single stranded ex- 
tended prirr.ers frcT. the templates and repeating steps 
A.b) to A.e) using said extended primers produced in 
(A) as both priners and templates, or 

25 b) repeating steps A.b) to A.e), 

C. the above process is terminated after an appropriate n-^-?- 
ber of cycles of process steps A. and B.a), A. and B,b), 
or cor?lDinations thereof, and 

D. optionally the producec p: lynucleot ides are amplified in a 
30 standard FCR reaction ■.■:izr. specific primers to selectively 

amplify polynucleotides of interest. 
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In a further specific embodiment the gene shuffling is 
performed by the method described in our co-filed application, 
whereby conserved regions of heterologous DNA sequences are 
5 identified for shuffling of heterologous DNA sequences of in- 
terest having at least one conserved region comprising the 
following steps: 

i) One or more conserved region (s) (designated "A,B,C" 
10 etc.) in two or more of the heterologous sequences are iden- 
tified. 

ii) Two sets of PGR primers .(each set comprising a sense and 
an anti-sense primer) for one or more conserved region (s) 

15 identified in (i) are constructed. 

In these primers, one set {named: "a"=sense primer; 
^^a' "=anti-sense prim.er) is directed to a sequence region 5' 
{sense strain) of the conserved region (e.g. conserved region 
"A"), and the second set (naned: ''b"=sense primer; ''b' "=anti- 

20 sense primer) is directed to a sequence region 3' (sense 
strain) of the conserved region (e.g. conserved region '^A"), 
and the antisense primer ^^a'" and the sense primer "b" have a 
homologous sequence overlap of at least 10 base pairs (bp) 
within the conserved region. 

2 5 

iii) for one or more identified conserved region of interest 
in step (i) two PGR amplification reactions are performed us- 
ing the heterologous DNA sequences from step (i) as templates, 
whereby one of the PGR reactions is using the 5' prim.er set 

30 identified in step (ii) (e.g. named "a", "a"') and the second 
PGR reaction is using the 3' prim.er set identified in step 
(ii) (e.g. named -b'\"b' ") . 

iv) . The PGR fragments generated as described in step (iii) 
35 for one or more of the identified conserved region in step 

(i) ; are isolated. 
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v) Two or more PGR fragments isolated from step (iv) and 
performance of a Sequence overlap extension PGR reaction (SOI- 
PCR) using the isolated PGR fragments as templates are pooled. 

5 • 

vi) The PGR fragment (s) obtained in step (v) are isolated, 
whereby the isolated PGR fragment comprise numerous differer.c 
shuffled sequences containing a shuffled mixture of the PGR 
fragments isolated in step (iv) . 

10 

In specific embodiments various modifications can be mace 
in the process of the invention.* For example it is advantageous 
to apply a defective polymerase either an error-prone po- 
lymerase to introduce mutations in comparison to the templates, 
15 or a polymerase that will discontinue the polynucleotide syn- 
thesis prematurely to effect the arrest of the reaction. 

According to a specific en-i^odiment the peptide is a pro- 
tease, especially a subtilase. 

In the case of a subiilase identification sequences may 
20 be located around the aspartic acid in position 32, or the his- 
tidine in position 64 and the active serine in position 221 c: 
subtilisin B?N' . 

In a further erjrodir.ent the peptide is an amylase, espe- 
cially an a-amyiase. 
25 In that case of identification sequences may be located 

around the Asp in position 100 and the Asp in position 328 of 
B, lichaniforrSs a-amylase. 

For a-amylases frox Bacillus species the identification 
sequences may preferentially he located around Tyr in position 
30 8 and around Ser in position 476. 

In further embodirr.ent s the peptide is a lipase, or a cel- 
lulase . 
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In respect of lipases, suitable identification sequences 
may be found by using the lipase alignment shown in A. Svendsen 
et ai. (1995) : Biochemical properties of cloned lipases frcr. 
the Pseudomonas family, Biochimica et Biophysics Acta 1259 5- 
5 17. Examples could be around the Pro in position 10 or around 
the His in position 285 (using P. glumae lipase numbering) . 

In respect of cellulases, in particular cellulases frcn 
family 45 cellulases (see WO 96/29397), suitable identification 
sequences may be the conserved region '''Thr Arg Tyr Trp Asp Cys 
10 Cys Lys Pro/Thr" and the conserved region "''Trp Arg Phe/Tyr 
Asp Trp Phe". For further details relating to those cellulase 
identification sequences- reference is made to (PCT 
DK97/00216) . See in particular in example 3 of (FCT 
DK97/00216) . 

15 In respect of xylanases, in particular xylanases frc- 

family 11 xylanases, suitable identification sequences may be 
the conserved regions "DGGTYDIY" and "EGYQSSG", For further de- 
tails relating to those xylanase identification sequences ref- 
erence is made to (PCT d:<97/0C216) . See in particular in exaz- 

20 pie 1,2 of (PCT DK9770021o) . 

PCR-orimers : 

The PCR primers are constructed according to the standard 
descriptions in the art, Mormaliy they are 10-75 base-pairs 
25 (bp) long. However, for the specific embodiment using random or 
semi-random primers the length m.ay be substantially longer as 
indicated above. 

PCR-reactions : 

30 If not otherwise mentioned the ?CR-reaction performed ac- 

cording to the invention are performed according to standard 
protocols kno'wn in the art. 
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The term ''Isolation of PGR fragment" is intended to cover 
as broad as simply an aliquot containing the PGR fragment. How- 
ever preferably the PGR fragment is isolated to an extend which 
remove surplus of primers, nucleotides templates etc. . 
5 In an embodiment of the invention the DNA fragment (s) 

is (are) prepared under conditions resulting in a low, medium or 
high random mutagenesis frequency. 

To obtain low mutagenesis frequency the DNA sequence (s) 
(comprising the DNA fragment (s)) may be prepared by a standard 
10 PGR amplification method (US 4,683,202 or Saiki et al., (1985), 
Science 239, 487 - 491) . 

A medium or high mutagenesis frequency may be obtained by 
performing the PGR amplification under conditions which in- 
crease the misincorporation of nucleotides, for instance as ce- 
15 scribed by Deshler, (1992), GATA 9(4), 103-106; Leung et al., 
(1989), Technique, Vol. 1, No, 1, 11-15. 

It is also conteTplated according to the invention to 
combine the PGR amplif icar ion (i.e. according to this embodi- 
ment also DNA fragment nutation) with a mutagenesis step using 
20 a suitable physical or chemical mutagenizing agent, e.g., one 
v/hich induces transitions, t ransversions , inversions, scra.Tbl- 
ing, deletions, and/or insertions. 

Expressing the recoTjpinant protein from the recombinant shuf- 

2 5 fled sequences 

Expression of the recombinant protein' encoded by the 
shuffled sequence in step vi) of the second and third aspect of 
the present invention may be performed by use of standard ex- 
pression vectors and corresp::nding expression systems known in 

30 the art. 
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Screening and selection 

In the context of the present invention the te 
"positive polypeptide variants" means resulting polypepti 
variants possessing functional properties which *has been i 
proved in comparison to the polypeptides producible from t 
corresponding input DNA sequences. Examples, of such improv 
properties can be as different as e.g. enhance or lowered bi 
logical activity, increased wash performance, thermostabilit 
oxidation stability, substrate specificity, antibiotic resis 
tance etc. 

Consequently, the screening method to be used for identi 
fying positive variants depend on which property of th 
polypeptide in question it is desired to change, and in wh = 
direction the change is desired. 

A number of suitable screening or selection systems z 
screen or select for a desired biological activity are de 
scribed in the art. Examples are: 

Strauberg et al. (Biotechnology 13: 669-673 (1995), de 
scribes a screening syster. for subtilisin variants havinc 
Calcium- independent stability; 

Bryan et al, (Proteins 1:326-334 (1986)) describes 
screening assay for protease having enhanced thermal stabilirv 
and 

PCT-DK96/00322 describes a screening assay for lipase 
having an improved wash performance in washing detergents. 

An embodiment of the invention comprise screening or se 
lection of recombinant protein(s), wherein the desired bioloci 
cal activity is performance in dish-wash or laundry detergents 
Examples of suitable dish--.-;ish or laundry detergents are dis 
closed in ?CT-DK9d/00322 and /:j 95/30011. 



wo 98/41622 



19 



PCT/DK98/00103 



If the improved functional property of the polypeptide is 
not sufficiently good after one cycle of shuffling, the 
polypeptide may be subjected to another cycle. 

In an embodiment of the invention wherein polynucleotides 
5 representing a number of mutations of the same gene is used as 
templates at least one shuffling cycle is a backcrossing cycle 
with the initially used DNA fragment, which may be the wild- 
type DNA fragment. This eliminates non-essential mutations. 
Non-essential mutations may also be eliminated by using wilc- 

10 type DNA fragments as the initially used input DNA material. 

Also contemplated to be within the invention is polypep- 
tides having biological activity such as insulin, ACTH, gluca- 
gon, somatostatin, somatotropin, thymosin, parathyroid horm.one, 
pituary hormones, somatomedin, erythropoietin, luteinizing hor- 

15 ■ mone, chorionic gonadotropin, hypothalamic releasing factors, 
antidiuretic hormones, thyroid stimulating hormone, relaxin', 
interferon, thrombopoeitin (TPO) and prolactin. 

It is also contemplated according to the invention to 
shuffle parental polynucleotides as indicated above originating 

20 from wild type organisms of different genera. 

The starting parental DNA sequevLces may be any DNA se- 
cuences including wild-type DNA sequences', DNA sequences encod- 
ing variants or m.utants, or modifications thereof, such as ex- 
tended or elongated DNA sequences, and miay also be the outcome 

25 of DNA sequences having been subjected to one or more cycles of 
shuffling (i.e. output DNA sequences) according to the method 
of the invention or any other method (e.g. any of the methods 
described in the prior art section) , or synthetic sequences or 
otherwise mutagenized sequences. 

30 When using the method of the invention the resulting re- 

combined polynucleotides (i.e. shuffled DNA sequences), have 
had a number of nucleotide fragr?.ents exchanged. This results in 
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•replacement of at least one amino acid within the polypeptide 
variant, if comparing it with the parent polypeptide. It is to 
be understood that also silent exchanges are contemplated (i.e. 
nucleotide exchange which does not result in changes in the 
5 amino acid sequence) . 



MATERIALS AND METHODS 
10 EXAMPLES 

'EK;i>IPLE 1 . , ■ 

Shuffling ■ of a pool/pooulation of evolutionary homolocues 
originating from bacterial hosts. 

15 

In this Example a gene shuffling method similar to the 
one described in WO 95/22625 is used; 

A population of subtilase-encoding genes or parts of such 
genes are generated through isolation or by synthesis. Sources 

20 for the genes may be as described in Siezen et ai. Protein £r.- 
gineering 4 1991 719-737. The population may also comprise 
genes encoding the pre-pro subtilases as defined in GenBank en- 
tries A13050_l, D26542, A22550, Swiss-Prot entry SUBT^BAC.-.M 
P007S2, and PD49S (Patent Application No. WO 96/34963) with 

25 hcx.ologies ( similarities ) ranging from 32% to 64% as calculated 
by the MegAlign software from DNA5TAR Inc. (WI 53715, USA) us- 
ing the Clustal Method. 

The substrates used in the shuffling reaction are repre- 
sented by linear double stranded DNA generated by PGR amplifi- 

30 cation using prim^ers located at/directed towards the ends cf 
the DNA to be shuffled. In this instance the prim.ers can con- 
veniently be constructed using the sequences surrounding the 
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.histidine in pos 64 of subtilisin BPN' and the serine in posi- 
tion 221 of subtilisin BPN' . The template for this PGR can ei- 
ther be plasmids containing cloned protease genes or chromoso- 
mal DNA extracted from bacterial strains e.g. protease secrei:- 
5 ing bacteria isolated from soil. The substrate will typically 
be generated separately for all the templates and pooled before 
the shuffling reaction. 

The substrates are fragmented e.g. by DNAse I treatmer.r 
or shearing by sonication as described in WO 95/22625. The gen- 
10 erated fragments are separated according to size by agarose eel 
electrophoresis and generated -fragment of the desired size, 
e.g. from 10 to 50 bp. or from 30 to 100 bp, or from 50 to 150 
bp, or from 100 to 200 bp are purified from the gel. 

These fragments are reasseTJoled by PGR as described in 
15 W095/22625. Got ionally, correctly assembled DNA fragments are 
amplified by subjecting the product from the assembly reaction 
to another PGR including two primers able to anneal to the ends 
of correctly assembled fragr.ents. The resulting fragments ccn 
be cloned into suitable expression plasmids, and subsequently 
20 screened for a specific property, such as thermostability using 
assays well known in the art. 
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PATENT CLAIMS 



1. A method for the construction of a library of recombine 
polynucleotides from a number of different starting single c 
double stranded parental DNA templates, wherein said startir. 
single or double stranded parental DNA templates represent dis 
Crete points in a population of genes encoding evolutionary c 
synthetic homologues of a peptide having homologies rangir.; 
over a broad spectrum from less than 15% to more than 80%, sai: 
population exhibiting at least one identification sequence, ar.c 
whereby said genes are subjected to a gene shuffling procedure 
to generate shuffled mutants of said population of genes repre- 
senting additional discrete points between those of said star-- 
ing templates . 

2. The method of claim 1, wherein said homologies range frc.T. 
less than 45%, 40%, 35%, 30%, 25%, 20%, or 15% to more than 
80%, 85%, 90%, 95%, or 99%. 

3. The method of claim 1 or 2, wherein said starting popula- 
tion exhibits at least two identification sequences. 

4. The method of any of the claims 1 to 3, wherein said 
identification sequences corresponds to amino acid sequences of 
frorr, 4 to 8 amino acid residues, which sequence is highly con- 
served among the peptides encoded by said collection of start- 
ing single or double stranded parental DNA templates, prefera- 
bly from 5 to 7 amino acid residues. 

5. The method of clai- 3 cr 4, wherein said identification 
sequences are located a distance apart corresponding to the av- 
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.erage size of the genes in said collection with a variation cf 
up to 40%, 

6. The method of claim 3, wherein said variation is 20%, 
5 15%, 10%, or 5%. 

7. A method of identifying a polypeptide of interest exhib- 
iting improved properties in comparison to naturally occurrir.r 
or other known polypeptides of the same activity, whereby = 

10 population of recombined polynucleotides produced by a process 
according to any of the claims 1- to 6 are cloned into an appro- 
priate vector, said vector is . transformed into a suitable hosi 
system, to be expressed into the corresponding polypeptides, 
and said polypeptides are screened in a suitable assay, and 

15 positive polypeptides selected. 

8. A method for producing a polypeptide of interest as iden- 
tified according to clairr. 7, whereby a vector comprising a 
polynucleotide encoding said identified polypeptide is trans- 
formed into a suitable host, said host is crown to express said 
polypeptide, and the polv^^P^i^- recovered and purified. 

9. The m.ethod of clairr. 3, wherein said peptide is a prote- 
ase, especially a subtilase. 

10. The method of claim 9, wherein said identification se- 
quences are located around the histidine in position 64 and the 
active serine in position 221 c: subtilisin BPN' . 

30 11. The method of clairr. v;herein said peptide is an a.-y- 
lase, especially an a-amylase. 



20 



25 



wo 98/41622 



24 



PCT/DKP8/00103 



12. The method of claim 11, wherein said identification 
quences are located around the Asp in position 100 and the 
in position 328 of B, licheniformis a-amylase. 

12. The method of claim 11, wherein said identification s 
quences are located around the Tyr in position 8 and around S 
in position 476 of B. licheniformis a-amylase. 

13. The method of claim 8, wherein said peptide is a lipase. 

14. The method^ of claim 13,- wherein said identification s: 
quences are located around the Pro in position 10 and arou.- 
the His in position 285 of P. glumae lipase. 



15. The method of claim S, wherein said peptide is a cellu 
lase . 



15. The method of clairr, 8, wherein said peptide is a xyla 
nase . 
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